Enable Pipeline Parallelism on jax worker #1043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Chenyaaang wants to merge 2 commits into main from chenyangli/pp-2

Collaborator

Chenyaaang commented Nov 7, 2025

Description

The implementation for Pipeline Parallelism are splitted into the following small PRs.

This PR is to modify Jax worker to support PP.

The worker __init__ takes in the current worker's IP and its previous worker's IP, to start transfer server and connection later.
During execute_model, for the PP workers who not in the first rank, they need to receive intermediate tensor from the previous worker. For the PP workers who is not the last rank, they need to send intermediate tensor to their next worker.
The profiler should profile every PP worker, so add subfolders under the parent profile_dir to save profiles for each worker.

Tests

E2E test has verified the whole PP implementation works properly.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

github-actions bot commented Nov 7, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

Chenyaaang requested review from mrjunwan-lang and yixinshi

November 7, 2025 18:23

Chenyaaang force-pushed the chenyangli/pp-2 branch from 2acca6f to e066aa3 Compare

November 8, 2025 01:01


          worker changes for pp

4cac52f

Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang force-pushed the chenyangli/pp-2 branch from e066aa3 to 4cac52f Compare

November 8, 2025 01:06

This was referenced Nov 8, 2025

Enable Pipeline Parallelism on Jax runner #1053

Open

Enable Pipeline Parallelism on Jax TPU platform #1054

Open

Enable Pipeline Parallelism on torchax path #1055

Open

vanbasten23 reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py

    
                          self.step_counter += 1

                          return None

                      else:

                          self.step_counter += 1

Collaborator

vanbasten23 Nov 11, 2025

why do we need this step_counter?

Collaborator Author

Chenyaaang Nov 11, 2025

It's used to generate uuid, we want it to be unique between each run and each worker, so we hash scheduler_output, step and worker_rank.

Collaborator

yixinshi Nov 13, 2025 •

edited

Loading

Better explain this in line 125

vanbasten23 reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py Show resolved Hide resolved

vanbasten23 reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py Show resolved Hide resolved

This was referenced Nov 12, 2025

Enable Pipeline Parallelism on Jax models #1077

Open

Enable Pipeline Parallelism on Ray #1078

Open

mrjunwan-lang reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py

    
                      multihost_backend = os.environ.get("TPU_MULTIHOST_BACKEND", "").lower()

                      if multihost_backend != "ray" and self.parallel_config.pipeline_parallel_size > 1:

                          # Note: Below is the setting for v6e8 host (8 chips of v6e)

                          # There are 2 ways of subslicing a v6e:

Collaborator

mrjunwan-lang Nov 12, 2025

do we need report errors if the settings are not in these 2 ways?

Collaborator Author

Chenyaaang Nov 12, 2025

I use v6e8 as an example to provide 2 ways to subslice the chips. I was thinking if the customer is using other chips, they should replace with their own topology. Do you have any better idea to interpret this?

Collaborator

yixinshi Nov 13, 2025

Can the topology be passed as config variables (and make the default as one of v6e's supported topology) or at least a parameter of init_device() so people would find the needed changes more easily? And please move lines 136-141 to line 152.

mrjunwan-lang reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py Outdated Show resolved Hide resolved


          resolve comments

103a581

Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang requested a review from sixiang-google

November 12, 2025 20:29

Collaborator Author

Chenyaaang commented Nov 12, 2025

@sixiang-google Hi Xiang, can you help me take a look at init_device() will my change to local_devices affect disagg serving? Thanks

yixinshi reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py

    
                      # For PP, we use MPMD so we want to profile every worker.

                      if self.pp_world_size > 1 and envs.VLLM_TORCH_PROFILER_DIR:

                          self.profile_dir = os.path.join(envs.VLLM_TORCH_PROFILER_DIR,

                                                          f"rank_{self.rank}")

Collaborator

yixinshi Nov 13, 2025

not sure the convention here but this might be more informative? f"rank_{self.rank}_{self.pp_world_size}"

yixinshi reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py

    
                              assert jax.local_device_count(

                              ) >= sharding_config.total_devices

                              self.devices = jax.local_devices()[:sharding_config.

Collaborator

yixinshi Nov 13, 2025

nit: combine line 183 and 184 to improve the readability?

yixinshi reviewed

View reviewed changes

tpu_inference/worker/tpu_worker_jax.py

    
                                                         self.rank, self.rank == 0,

                                                         self.rank == self.pp_world_size - 1)

                      logger.info(f"Init worker | "

                                  f"rank={self.rank} | "

Collaborator

yixinshi Nov 13, 2025

Add world_size as well?

yixinshi reviewed

View reviewed changes

Collaborator

yixinshi left a comment

Good work! A general comment: shall we have more specific PR title here for each PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet